Modeling presence or absence is a classic problem involving mixture models, specifically random variables that are zero-inflated. Extra zeros are encountered when we model presence or absence because zeros arise from two conditions: truly absent individuals and individuals present but undetected. This means we need a model for the process that controls occupancy, the true state, and model of the data that accounts for detection. This is often our starting point in Bayesian analysis – there is a true, unobserved state we seek to understand using a model of a process. We take imperfect observations on that state and must correct them using a model of the data.
R libraries needed for this lab
You need to load the following libraries. Set the seed to 10 to compare your answers to ours.
A fundamental question in landscape ecology seeks to understand how landscape structure shapes variation in habitat use by species. We will use data from the Swiss Survey of Common Breeding Birds, courtesy of Royle and Dorazio (2008), to model habitat occupancy by a resident bird in the Swiss Alps, the willow tit (Parus montanus). The data come from annual surveys of one km2 quadrats distributed across Switzerland (Fig. 1). Surveys are conducted during the breeding season on three separate days, but some quadrats have missing data so that the number of replicate observations is fewer than three.
During each survey, an observer records every visual or acoustic detection of a breeding species (we do not differentiate between these two types of detection in this problem) and marks its location using a global positioning system or, in earlier years, a paper map. We assume that the true state (occupied or unoccupied) does not change among sample dates, an assumption known as closure. This assumption is reasonable because we are observing a resident species during the breeding season.
Fig. 1. The willow tit (left, credit: Francis C. Franklin) is one of 70 bird species that are surveyed annually for abundance in 267 1 km2 sampling units distributed across Switzerland (right, credit: the Swiss Ornithological Institute).
We want to understand the influence of forest cover and elevation on the distribution of the willow tit. The data frame SwissBirds has the number of times a quadrat (quadrat) was searched (numberVisits) and the number of times willow tits were detected (numberDetections). We have covariates on forest canopy cover (forestCover) as well as elevation in meters (elevation) for each quadrat surveyed. Data on detection each day on each quadrat (0 or 1) are also available. Develop a model of the influence of forest cover and elevation on the distribution of willow tits. Your model should allow estimation of the optimum elevation of willow tit habitat at the mean forest cover, where optimum elevation is defined as the elevation where probability of occupancy is maximum.
Diagram the network of knowns and unknowns.
Write a mathematical expression for the posterior and the joint distribution.
Modify your model to include the effect of search time and wind speed (measured at each quadrat on each day) on detection probability. Draw a DAG and write the posterior and joint distributions. In so doing, assume that posterior predictive checks of a preliminary detection model revealed that you need to include an explicit variance term for the detection probability.
You would need to use a Bernoulli instead of a binomial likelihood if covariates varied by site and day. You would need to model detection probability as a beta distributed random variable to explicitly include a variance term.
Approximate the marginal posterior distributions of parameters in the forest and elevation model with constant detection probability (the first one, above) using JAGS. Conduct posterior predictive checks. Some hints: 1) You will need to standardize the covariates by subtracting the mean and dividing by the standard deviation for each observation in the elevation and forest cover data. Use the scale function to do this (it will drastically speed convergence). 2) You must give initial values of 1 to all unknown 0 or 1 z states.
{ # Extra bracket needed only for R markdown files
sink("SwissBirds.R")
cat("
model {
# priors
p ~ dbeta(1, 1)
for (i in 1:4){
beta[i] ~ dnorm(0, .368)
}
# likelihood
for (i in 1:N){
z[i] ~ dbern(psi[i])
logit(psi[i]) <- beta[1] + beta[2] * elevation[i] + beta[3] * elevation[i]^2 + beta[4] * forestCover[i]
y[i] ~ dbin(z[i] * p, n[i])
y.new[i] ~ dbin(z[i] * p, n[i])
}
# derived quantities
elevationMaxScaled <- -beta[2] / (2 * beta[3])
elevationMax <- elevationMaxScaled * sdElevation + muElevation
for (j in 1:length(elevation_pred_std)) {
logit(psiPredict[j]) <- beta[1] + beta[2] * elevation_pred_std[j] + beta[3] * elevation_pred_std[j]^2
}
#posterior predictive checks
mean.y <- mean(y[])
mean.y.new <- mean(y.new[])
p.value.mean <- step(mean.y.new-mean.y)
sd.y <- sd(y[])
sd.y.new <- sd(y.new[])
p.value.sd <- step(sd.y.new-sd.y)
}
", fill = TRUE)
sink()
} # Extra bracket needed only for R markdown files
elevation <- as.vector(scale(SwissBirds$elevation))
muElevation <- mean(SwissBirds$elevation)
sdElevation <- sd(SwissBirds$elevation)
# for plotting probability of occupancy vs unstandardized elevation
elevation_pred = seq(500, 2500, 10)
#Standardized for making predictions of probability of occupancy at new
elevation_pred_std = (elevation_pred - muElevation) / sdElevation
forestCover <- as.vector(scale(SwissBirds$forestCover))
muforestCover <- mean(SwissBirds$forestCover)
sdforestCover <- sd(SwissBirds$forestCover)
inits = list(
list(z = rep(1, nrow(SwissBirds)), p = runif(1, 0, 1), beta = runif(4, -2, 2)),
list(z = rep(1, nrow(SwissBirds)), p = runif(1, 0, 1), beta = runif(4, -2, 2)),
list(z = rep(1, nrow(SwissBirds)), p = runif(1, 0, 1), beta = runif(4, -2, 2)))
data = list(
N = nrow(SwissBirds),
elevation = as.double(elevation),
forestCover = as.double(forestCover),
n = as.double(SwissBirds$numberVisits),
y = as.double(SwissBirds$numberDetections),
muElevation = as.double(muElevation),
sdElevation = as.double(sdElevation),
elevation_pred_std = as.double(elevation_pred_std))
Summarize the parameters and check chains for convergence. Exclude the predictions of \(\psi\) from the summary if they were included in the coda object. What can you conclude about model fit?
What can you conclude about the relative importance of elevation and forest cover in controlling the bird’s distribution? Plot the median probability of occupancy and the 95% highest posterior density interval as a function of elevation at the mean of forest cover. Find the optimum elevation of willow tit habitat at the mean forest cover, where optimum elevation is defined as the elevation where probability of occupancy is maximum. Plot a normalized histogram of MCMC output for the optimum elevation at the average forest cover. Overlay 0.95 highest posterior density limits on the optimum elevation.
We are entitled to conclude that elevation is roughly twice as important as forest cover because its coefficient the elevation coefficient is twice as large and because the data have been standardized.You need to formulate a deterministic model that has a maximum and that is relatively easy to differentiate, something like \(\mu_i = \beta_0+ \beta_1x_{1,i}+\beta_2x_{2,i}^2+\beta_3x_{3,1}\) where \(\mathbf{x}_2\) are the data on elevation and \(\mathbf{x}_3\) are the data on forest cover, when = 0 at the mean. Dropping the i subscript for more general notation, we now have:
\[\mu=\beta_0+ \beta_1x_1+\beta_2x_2^2.\]
To find the maximum, we take the first derivative \(\frac {d\mu}{dx}=\beta_1+ 2\beta_2 x_{1}\), set it to zero \(0=\beta_2+ 2\beta x_1\), and solve for \(x_1\), \(x_1=\frac{-\beta_1}{2\beta_2}\). You include this result as a derived quantity in your JAGS code.
Fig.2. Probability of occupancy at the mean forest cover as a function of elevation (left panel) and the posterior density of the optimum elevation at the mean forest cover (right panel). Dashed define the .95 highest posterior density interval. also known as .95 equal-tailed Bayesian credible intervals.
References
Royle, J.A., and R.M. Dorazio. 2008. Hierarchical Modeling and Inference in Ecology: The Analysis of Data from Populations, Metapopulations, and Communities. Academic Press, London, United Kingdom.